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Claims 

1. A method for gene mapping from chromosome and phenotype data, which 
utilizes linkage disequilibrium between genetic markers mf, which are polymorphic 
nucleic acid or protein sequences or strings of single-nucleotide polymorphisms de- 

5 riving from a chromosomal region, wherein 

i) all marker patterns P that satisfy a pattern evaluation function e(P) are 

searched from the data, wherein 

a. the marker patterns are expressions involving the genetic markers and their 
alleles and zero or more of the following: individual covariates, environ- 

10 mental variables and auxiliary phenotypes; and 

b. the pattern evaluation function e(P) involves some statistical measure of 
the association between the marker pattern P and the phenotype being 
studied, 

ii) each marker m; of the data is scored by a marker score s(mi), which is a func- 
15 tion of the set S t defined as the set of marker patterns overlapping the marker 

mj and satisfying the pattern evaluation function e as defined in step (i), and 

iii) the location of the gene is predicted as a function of the scores s(m\) of all the 

markers mj in the data and is based on maximizing the score if the scoring 
function is designed to give higher scores closer to the gene, and on minimiz- 
20 ing the score if the scoring function is designed to give lower scores closer to 

the gene, as is the case for instance when the scores s(mj) are marker-wise p 
values. 

2. A method of claim 1, wherein the chromosome data consists of either haplo- 
types or genotypes. 

25 3. A method of claim 1, wherein the haplotypes and genotypes referred to in the 
marker patterns contain flexible regions such as gaps or disjunctions. 

4. A method of claim 1, wherein the marker patterns P are searched by the fol- 
lowing algorithm: 

Input 



30 • set U of marker patterns 
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• evaluation function e(P) for patterns P in U 

• (generalization) relation < for patterns in U 

• where the function e and the relation < are such that if e(P) is true and P' < P, 
then eiP 1 ) is also true 

5 Output 

• set S = {P e U | e(P) is true} of patterns 
Method 

1. S: = {} 

2. // Initialize the set of evaluated patterns: 
10 3. E:= {} 

4. // Start with the most general patterns: 

5. Gen :={PiaU\ there is no P' in U, P' != P, such that P' < P) 

6. // Recursively evaluate patterns in a depth first order: 

7. foreach P e Gen { evaluatePatterns(F) } 
15 8. end; 

9. procedure evaluatePatterns(P) { 

10. insert P into the set E 

11. ife(P) = true then { 
20 12. insert P into set S 

13. // Find all specializations of P that have not been tested yet, and 

14. // evaluate them recursively: 

15. Spec := {P' in U-E\P< P', P' != P, and there is no P" in U-E, P" != P 

16. W P" != P\ with P<P"< P'J; 
25 17. foreach P' in S/?ec { evaluatePatterns(P'); } 

18. } 

19. } 

5. A method of claim 1, wherein the marker patterns P are searched by the fol- 
lowing algorithm: 

30 Input 

• set U of marker patterns 

• evaluation function e(P) for patterns P in U 

• frequency threshold x 
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Output 

• set S = {P in U \ e(P) and ae(P) is true} of patterns, where ae(P) is true if and 
only if the frequency of pattern P exceeds a given threshold x 

Method 

5 20. S : = {} 

2X.II Initialize the set of evaluated patterns: 

22. E:= {} 

23. // Start with the most general patterns: 

24. Gen :={PinU\ there is no P' in U, P' != P, such that P->P'} 
10 25 . // Recursively evaluate patterns in a depth first order: 

26. foreach P in Gen { evaluatePatterns(.P) } 

27. end 



28. procedure evaluatePatterns(P) { 
15 29 . insert P into the set E 

30. if ae(P) = true then { 

31. if e(P) = true then insert P into set S 

32. // Find all specializations of P that have not been tested yet, and evaluate 

33. // them recursively: 

20 34. Spec : = {P' in U-E \ P' -> P, P' != P, and there is no P" in U-E, P" != P 

35. andP"\=P' r with P' -> P" and P" -> P } 

36. foreach P' in Spec { evaluatePatterns(P') } 

37. } 

38. } 

25 6. A method of claim 1, wherein the marker patterns P are searched by the fol- 
lowing algorithm: 

Input 

• marker map M = {mi, ... ,m0 

• phenotype vector Y= (Yj, Y n ) 
30 • haplotype matrix H of size n * k 

• association threshold x for chi-squared test 

• maximum pattern length / 

• maximum number of gaps g 

• maximum gap size s 



35 



Output 

• set S = {P in U \ e(P) is true} of patterns, 

• where U consists of patterns on M that consist of marker-allele assignments and 
that adhere to parameters /, g, and i, and 

5 • where e{P) is true if and only if chi-squared test on P using haplotype matrix H 
and phenotypes Y exceeds the given threshold x 

Method 

39.5': = {} 

40. // Number of case and control chromosomes: 
10 41 .pi a '■= number of disease-associated chromosomes; 

42. piQ := number of control chromosomes; 

43. pi:=piA +piC 

44. // A lower bound for pattern frequency: 

45. lb : = piA *pi*x/ (pic *pi + pi A * x ) 

1 5 46. // Variable for iterating over different patterns: 

47. P = (p i, ...,p k ):=('*',..., '«") 

48. for i:=ltok { 

49. // alleles(mj-) is the set of alleles of the r.th marker 

50. foreach a in alleles(w;) { 
20 5\.pi:=a 

52 J I Test pattern P and all its extensions: 

53. checkPatterns(P, i, i, 0, 0) 

54. //Reset pf 

55. ^-:='*' 
25 56.} 

57. } 

58. end 

59. // Test haplotype pattern P and all patterns that can be generated by extending P 
30 60. //from the right: 

61. procedure checkPatterns(P, start, i, nr_of_gaps, gapjength) { 

62. // Output strongly associated patterns 

63. if chi-squared(P, M, H,Y)>=x and p\ != '*' then insert P into set S 

64. // Return if extended patterns would be too long: 
35 65. if i = k or i+l-start>l then return 

66. // Return if extended patterns can not be strongly disease-associated: 



36 



67. if frequency of P in disease-associated chromosomes is less than lb 

68. then return; 

69. // Create and test legal extensions of current pattern P (3 cases): 

70. // 1. Give marker i+1 all possible values: 
5 71.foreach a in alleles(ra z +/) { 

12.pi + i := a 

73. checkPatterns (P, start, nr_of_gaps, 0) 

74. } 

75. // 2. Introduce a new gap starting at marker i+1: 
10 76. if pi * '*' and nr_of_gaps < g and s > 1 then { 

T7.p i+1 := '*' 

78. checkPatterns (P, start, i+1, nr_of_gaps+l, 1) 

79. } 

80. // 3. Extend the current gap over marker i+1: 
15 81. if pi = '*' and gapjength < s then { 

S2.p i+1 ;= '•' 

83. checkPatterns (P, start, *"+/, nr_of_gaps, gap_length+l) 

84. } 

85. // Before returning, reset pi+ j: 
20 S6.p i+ 2 := 

87. return 

88. } 

7. A method of claim 1, wherein the marker patterns P are searched by the fol- 
lowing algorithm: 

25 Input 

• set U of marker patterns 

• evaluation function e(P) for patterns P in U 

• (generalization) relation < for patterns in U, where the function e and the rela- 
tion < are such that if e(P) is true and P' < P, then eiP 1 ) is also true 

30 Output 

• set S = {P in U | e(P) is true} of patterns 



Definitions 
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• function Lgg: U->2 U , Lgg(P) = { P ' in U | P > P' and P' != P and there is no 
P"'mU such that P \=P" != P' and P > P " > P '} , the set of least general gen- 
eralizations of pattern P. 

• function Lss: U -> 2 U , Lss(P) = {P'inU \ P<P' and P' != P and there is no 
P"'mU such that P\=P"\=P' and P < P" < P'}, the set of least special spe- 
cializations of pattern P. 

Method 

89.5 : = {} 

90. Q:={} 

91 . // Start with the most general patterns: 

92. F : = {P in U | there is no P' in U, P' != P, such that P' < P}; 

93. while F \= {} { 

94. // Evaluate the candidate patterns: 

95. foreachPinF { 

96. if e(P) = true then insert P into set S 

97. else remove P from set F 

98. } 

99. Q: = Q union F 

100. // Generate a new set of candidate patterns: 

101. C:={} 

102. foreachPinF { 

103 . C : = C union { P ' in U \ P ' in Lss{P) and for allP" in Lgg(P '): 

104. P"inQ} 

105. } 

106. F: = C 

107. } 

108. end 



8. A method of claim 1, wherein the marker patterns P are searched by the fol- 
30 lowing algorithm: 

Input 

• set U of marker patterns 

• evaluation function e{P) for patterns P in U 

• frequency threshold x 
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Output 

• set S = {P in U \ e(P) and ae(P) is true} of patterns, where ae{P) is true if and 
only if the frequency of pattern P exceeds a given threshold x 

Definitions 

5 • function Lgg: U -> 2 U , Lgg(P) = { P ' in U \P->P' and P'\=P and there is no 
P" in U such that P != P" != P' and P -> P" -> P'}, the set of least general 
generalizations of pattern P. 

• function Lss: U -> 2 U , Lss(P) = { P ' in U \ P' -> P and P' != P and there is no 
P"mU such that P != P" 1= P' and P' -> P" -> P}, the set of least special 

1 0 specializations of pattern P. 

Method 

109. S:= {} 

110. g: = {} 

111. // Start with the most general patterns: 

15 1 12. F := {P in U \ there is no P' in U, P' != P, such that P -> P' }; 
113. while F\= {} { 



114. // Evaluate the candidate patterns : 

115. foreachPin.F { 

116. if ae{P) = true then { 

20 1 1 7 . if e(P) = true then insert P into set S 

118. } 

119. else remove P from set F 

120. } 

121. Q\ = QxxmonF 

25 122. // Generate a new set of candidate patterns: 

123. C: = {} 

124. foreachPinF { 

125. C : = C union { P' in U \ P' in Lss(P) and for all P" in 
Lgg(Py. 

30 126. P"inQ} 

127. } 

128. F: = C 

129. } 

130. end 



35 



39 



9. A method of claim 1 , wherein 

a) the phenotype being studied is qualitative, and 

b) the pattern evaluation function e(P) has the form e(P) = true if and only if 
e'(P) > x, where e'(P) is the (signed) association measure x 2 and x is a user 

5 specified minimum value, which is chosen so that the sizes of S t are large 

enough, such as 20, to give statistically sufficiently reliable estimates for 
the gene locus, and 

c) the score s(nij) of marker nif is the size of S h also called marker- wise pat- 
tern frequency of mf and denoted by f(mj). 

10 10. A method of claim 1 , wherein 

a) the pattern evaluation function e(P) has the form e(P) = true if and only if 
e'(P) > x, where e'(P) is the absolute frequency of pattern P in the data and 
x is a user-specified value, which is chosen so that the sizes of S ( are large 
enough, such as 20, to give statistically sufficiently reliable estimates for 

15 the gene locus, and, 

b) in order to derive the score s(mj), the p value (statistical significance) of 
each marker pattern P in determining the phenotype being studied is evalu- 
ated, and 

c) the score s(mj) is the distance between the observed p value distribution of 
20 patterns in S, and the uniform distribution, defined as average of (p f - q t ) 

log (pi I qi) over all i = l..n, where n is the number of haplotype patterns in 
Sj, pi is the z'th smallest p value in S it and q t is the expectation of the z'th 
smallest p value, if the p values were randomly drawn from the uniform 
distribution. 

25 11. A method of claim 10, where the p value is computed using a linear model of 
form Y= fiiXi + ... + j3 k X k + aZ + J3 0 , where the dependent variable 7 is the pheno- 
type being studied, X x through X k are covariates, such as environmental factors, and 
Z is a dummy variable for the occurrence of the haplotype pattern, and 

the coefficients a and j3* are adjusted for best fit, and then 



30 the significance of Z as a covariate is assessed using a t test with the null hypothesis 
"a = 0". 
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12. A method of claim 1, wherein each score s(m\) is refined by replacing it by the 
marker-wise p value of the score s(ntj), where the statistical significance of s(m{) is 
measured against the null hypotheses that there is no gene effect. 

13. A method of claim 12, wherein the marker- wise p values p(mj) are determined 
5 by randomly permuting phenotypes. 

14. A method of claim 1, wherein the area returned from the prediction of the 
gene location is contiguous or fragmented or a point. 

15. A method of claim 1, wherein the location of the gene, predicted as a function 
of the scores s(mj) and based on maximizing or minimizing the score, is predicted to 

10 the location of the marker mi that maximizes or minimizes the marker score s(mj). 

16. A method of claim 1, wherein the location of the gene, predicted as a function 
of the scores s(m\) and based on maximizing or minimizing the score, is predicted to 
the combination of most probable intervals for containing the trait-susceptibility lo- 
cus that covers at most the desired proportion t (7e {0,100%}) of the original region 

15 obtained by taking all such points in the studied chromosomal region whose nearest 
marker is within the k best scoring markers, where k is selected such that the result- 
ing area has length at most t times the length of the studied region, and where k is 
maximal such value. 

17. A method of claim 1, wherein the location of the gene, predicted as a function 
20 of the scores s(mj) and based on maximizing or minimizing the score, is predicted to 

those points in the studied chromosomal region whose nearest marker scores at least 
y or at most y, where y is scoring function dependent and is selected so that the 
probability of the gene being close to the marker is sufficiently large. 

18. A method of claim 1, wherein the location of the gene, predicted as a function 
25 of the scores s(mf) and based on maximizing or minimizing the score, is determined 

by expert investigation of the marker scores or their visualization. 

19. A method of claim 1, wherein several genes are searched for simultaneously 
by using marker patterns that refer to several potential gene loci at the same time. 

20. A computer-readable data storage medium having computer-executable pro- 
30 gram code stored thereon operative to perform a method of any of preceding claims 

when executed on a computer. 



21. A computer system programmed to perform the method of any of claims 1 to 
19. 
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